AITopics | good generalization

Collaborating Authors

good generalization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

94b472a1842cd7c56dcb125fb2765fbd-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 20:56:58 GMT

classification, probability, regime, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre:

Research Report > Experimental Study (0.65)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

94b472a1842cd7c56dcb125fb2765fbd-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-17-2025, 02:18:37 GMT

artificial intelligence, classification, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Reviews: The Description Length of Deep Learning models

Neural Information Processing SystemsOct-7-2024, 08:42:11 GMT

Summary: The paper considers the problem of minimum description length of any model - resting on the conjecture (as supported by Solomonoff's general theory of inference and the Minimum Description Length) that a good model must compress the date along with its parameters well. Thus generalization is heuristically linked to compressive power of the model. Relating to the "Chaitin's hypothesis [Cha07] that "comprehension is compression" any regularity in the data can be exploited both to compress it and to make predictions", the authors focus only on the compression problem . Detailed Comments: Clarity: The paper in well written and presents a good survey of the related work. Originality: There is no theory in the paper.

deep learning model, description length, generalization, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

On information captured by neural networks: connections with memorization and generalization

Harutyunyan, Hrayr

arXiv.org Artificial IntelligenceJun-28-2023

Despite the popularity and success of deep learning, there is limited understanding of when, how, and why neural networks generalize to unseen examples. Since learning can be seen as extracting information from data, we formally study information captured by neural networks during training. Specifically, we start with viewing learning in presence of noisy labels from an information-theoretic perspective and derive a learning algorithm that limits label noise information in weights. We then define a notion of unique information that an individual sample provides to the training of a deep network, shedding some light on the behavior of neural networks on examples that are atypical, ambiguous, or belong to underrepresented subpopulations. We relate example informativeness to generalization by deriving nonvacuous generalization gap bounds. Finally, by studying knowledge distillation, we highlight the important role of data and label complexity in generalization. Overall, our findings contribute to a deeper understanding of the mechanisms underlying neural network generalization.

accuracy test accuracy, adjusted supervision complexity, good generalization, (17 more...)

arXiv.org Artificial Intelligence

2306.15918

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Quebec > Montreal (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Education (1.00)
Information Technology > Security & Privacy (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Add feedback

Bayes Complexity of Learners vs Overfitting

Głuch, Grzegorz, Urbanke, Rudiger

arXiv.org Artificial IntelligenceMar-13-2023

We introduce a new notion of complexity of functions and we show that it has the following properties: (i) it governs a PAC Bayes-like generalization bound, (ii) for neural networks it relates to natural notions of complexity of functions (such as the variation), and (iii) it explains the generalization gap between neural networks and linear schemes. While there is a large set of papers which describes bounds that have each such property in isolation, and even some that have two, as far as we know, this is a first notion that satisfies all three of them. Moreover, in contrast to previous works, our notion naturally generalizes to neural networks with several layers. Even though the computation of our complexity is nontrivial in general, an upper-bound is often easy to derive, even for higher number of layers and functions with structure, such as period functions. An upper-bound we derive allows to show a separation in the number of samples needed for good generalization between 2 and 4-layer neural networks for periodic functions.

artificial intelligence, complexity, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.07874

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Statistical learning theory and empirical risk

#artificialintelligenceJan-2-2023, 10:00:17 GMT

Here, I'll be giving an overview and theoretical concepts of the statistical learning. Supervised learning can play a key role in learning from examples. From this algorithm, useful information can be easily extracted from large datasets, the problem of learning from examples consecutively involves approximating functions from a sparse and noisy data. In supervised learning, network is trained on a dataset of the form, T {xk, dk} from k 1 to Q. It is observed that using MLP multilayer perceptron with sufficient number of hidden neurons, it is possible to approximate a given function to any arbitrary degree of accuracy.

approximate function, generalization, training data, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)

Add feedback

Explaining generalization in deep learning: progress and fundamental limits

Nagarajan, Vaishnavh

arXiv.org Machine LearningOct-17-2021

This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error? In the first part of the thesis, we will empirically study how training deep networks via stochastic gradient descent implicitly controls the networks' capacity. Subsequently, to show how this leads to better generalization, we will derive {\em data-dependent} {\em uniform-convergence-based} generalization bounds with improved dependencies on the parameter count. Uniform convergence has in fact been the most widely used tool in deep learning literature, thanks to its simplicity and generality. Given its popularity, in this thesis, we will also take a step back to identify the fundamental limits of uniform convergence as a tool to explain generalization. In particular, we will show that in some example overparameterized settings, {\em any} uniform convergence bound will provide only a vacuous generalization bound. With this realization in mind, in the last part of the thesis, we will change course and introduce an {\em empirical} technique to estimate generalization using unlabeled data. Our technique does not rely on any notion of uniform-convergece-based complexity and is remarkably precise. We will theoretically show why our technique enjoys such precision. We will conclude by discussing how future work could explore novel ways to incorporate distributional assumptions in generalization bounds (such as in the form of unlabeled data) and explore other tools to derive bounds, perhaps by modifying uniform convergence or by developing completely new tools altogether.

satisfy class-aggregated calibration, stochastic gradient descent, training set size figure 8, (12 more...)

arXiv.org Machine Learning

2110.08922

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(2 more...)

Genre: Research Report > New Finding (0.92)

Industry:

Education (0.87)
Government > Regional Government > North America Government > United States Government (0.67)
Health & Medicine > Therapeutic Area (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

On the Inductive Bias of a CNN for Orthogonal Patterns Distributions

Brutzkus, Alon, Globerson, Amir

arXiv.org Machine LearningFeb-22-2020

Training overparameterized convolutional neural networks with gradient based methods is the most successful learning method for image classification. However, its theoretical properties are far from understood even for very simple learning tasks. In this work, we consider a simplified image classification task where images contain orthogonal patches and are learned with a 3-layer overparameterized convolutional network and stochastic gradient descent. We empirically identify a novel phenomenon where the dot-product between the learned pattern detectors and their detected patterns are governed by the pattern statistics in the training set. We call this phenomenon Pattern Statistics Inductive Bias (PSI) and prove that PSI holds for a simple setup with two points in the training set. Furthermore, we prove that if PSI holds, stochastic gradient descent has sample complexity $O(d^2\log(d))$ where $d$ is the filter dimension. In contrast, we show a VC dimension lower bound in our setting which is exponential in $d$. Taken together, our results provide strong evidence that PSI is a unique inductive bias of stochastic gradient descent, that guarantees good generalization properties.

artificial intelligence, inductive bias, machine learning, (16 more...)

arXiv.org Machine Learning

2002.09781

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Understanding overfitting peaks in generalization error: Analytical risk curves for $l_2$ and $l_1$ penalized interpolation

Mitra, Partha P

arXiv.org Machine LearningJun-9-2019

Traditionally in regression one minimizes the number of fitting parameters or uses smoothing/regularization to trade training (TE) and generalization error (GE). Driving TE to zero by increasing fitting degrees of freedom (dof) is expected to increase GE. However modern big-data approaches, including deep nets, seem to over-parametrize and send TE to zero (data interpolation) without impacting GE. Overparametrization has the benefit that global minima of the empirical loss function proliferate and become easier to find. These phenomena have drawn theoretical attention. Regression and classification algorithms have been shown that interpolate data but also generalize optimally. An interesting related phenomenon has been noted: the existence of non-monotonic risk curves, with a peak in GE with increasing dof. It was suggested that this peak separates a classical regime from a modern regime where over-parametrization improves performance. Similar over-fitting peaks were reported previously (statistical physics approach to learning) and attributed to increased fitting model flexibility. We introduce a generative and fitting model pair ("Misparametrized Sparse Regression" or MiSpaR) and show that the overfitting peak can be dissociated from the point at which the fitting function gains enough dof's to match the data generative model and thus provides good generalization. This complicates the interpretation of overfitting peaks as separating a "classical" from a "modern" regime. Data interpolation itself cannot guarantee good generalization: we need to study the interpolation with different penalty terms. We present analytical formulae for GE curves for MiSpaR with $l_2$ and $l_1$ penalties, in the interpolating limit $\lambda\rightarrow 0$.These risk curves exhibit important differences and help elucidate the underlying phenomena.

artificial intelligence, generalization error, machine learning, (16 more...)

arXiv.org Machine Learning

1906.03667

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Shehroz Khan's answer to What are the cons and disadvantages of using deep learning? - Quora

#artificialintelligenceNov-27-2017, 06:30:26 GMT

Too much hard work needed to make a model work, if at all it works:-) To much effort is required to change those trivial network settings and configurations. Knowledge of'black-art' to tune/initialize parameters. Lot of computational time and memory is needed, forget to run deep learning programs on a laptop or PC, if your data is large. Lot of book-keeping is needed to analyze the outcomes from multiple deep learning models you are training on. Filters produced by the deep network can be hard to interpret.

artificial intelligence, deep learning, machine learning, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback